Building Thesaurus from Manual Sources and Automatic Scanned Texts
نویسنده
چکیده
This paper describes the work done in the TIPS project about the construction of a thesaurus base. This construction is a merge from a thesaurus manually built and one automatically extracted from large text corpora. Several manually built thesaurus have been semiformatted to be merged in a consistent common base. The automatic extraction is based on both syntax and statistics. We present in this paper the way thesaurus are built and the results on Scienti c corpus in the context of the TIPS project.
منابع مشابه
Multilingual Ontology Enrichment for Semantic Annotation and Retrieval of Medical Information
Background: Knowledge management in the European project Noesis addresses concept-based annotation and multilingual Information Retrieval of documents. Objective: Multilingual enrichment of a concept-based terminology in the medical field. Experience and evaluation in the domain of cardiovascular diseases by enriching a subset of the MeSH thesaurus in six European languages. This terminology, r...
متن کاملMethodology For Building Thematic Indexes In Medicine For French
The aim of this project is to propose a methodology in automatically building thematic index from French medical texts in order to improve the IR process. In this article, we focus on the selection process of relevant terms. Contrary to Bourigault and Charlet (1999) who defend a statistical method followed by human intervention, we propose an automatic method that takes advantage of available a...
متن کاملConceptual Business Process Structuring by Extracting Knowledge from Natural Language Texts
This article discusses methods of constructing a formalized structure of a subject domain based on analysis of natural language texts, including discovering objects, their properties and related actions, followed by discovering business processes specific to the subject domain and the formation of thesaurus and business processes of the subject domain. At the same time the thesaurus can be chan...
متن کاملConstruction of Thematic Representations of Texts Based on Domain-Specific Thesaurus
The paper considers interrelations between lexical cohesion and the thematic structure of a text. The technique of automatic construction of the thematic representation of the text contexts is described. The technique uses knowledge from Sociopolitical thesaurus, which was specially developed as a tool for automatic text processing.
متن کاملMeSH Up: effective MeSH text classification for improved document retrieval
MOTIVATION Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small...
متن کامل